Extremal properties of random trees 
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We investigate extremal statistical properties such as the maximal and the minimal heights of 
randomly generated binary trees. By analyzing the master evolution equations we show that the 
cumulative distribution of extremal heights approaches a traveling wave form. The wave front in 
the minimal case is governed by the small-extremal-height tail of the distribution, and conversely, 
the front in the maximal case is governed by the large-extremal-height tail of the distribution. 
We determine several statistical characteristics of the extremal height distribution analytically. In 
particular, the expected minimal and maximal heights grow logarithmically with the tree size, 



N, K 



. In TV, and h n 



In TV, with v„ 



= 0.373365... and v m!lx = 4.31107. 



respectively. Corrections to this asymptotic behavior are of order 0(lnlnTV). 
PACS numbers: 02.50.-r, 05.40.-a, 89.20.Ff 
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Random trees play an important role in data storage 
and retrieval algorithms in computer science . They 
also arise in physical situations such as collision processes 
in gases 0, random fragmentation processes and 
diffusion-limited aggregation 0. In each case, extremal 
characteristics such as the maximal or the minimal height 
of the tree, namely, the maximal || or the minimal Q 
number of bonds separating the tree root from a node 
are of interest. In data storage algorithms, these dis- 
tances yield the best-case or the worst-case-scenario per- 
formances. In kinetic theory, the largest Lyapunov expo- 
nent is related to the maximum height problem . 

In this article, we study extremal properties of ran- 
domly generated binary trees using rate equation theory. 
Techniques developed in aggregation processes are well 
suited for treating random trees since the tree merger 
process is simply an aggregation process. We study the 
distributions of extremal (both minimal and maximal) 
heights of a tree. In both cases, the average extremal 
tree height grows logarithmically with the number of 
leaves and the cumulative distribution of extremal tree 
heights approaches a traveling wave solution. The log- 
arithmic growth prefactors equal the traveling wave ve- 
locities, which are set by a velocity selection principle. 
These velocities can be alternatively obtained using a 
simpler (and independent) intuitive argument. Interest- 
ingly, the wave front in the minimal case is determined 
by the small-height tail of the distribution, while in the 
maximal case it is determined by the large-height tail of 
the distribution. 

Let us introduce the tree generation model. Initially, 
the system consists of an infinite number of trivial (single- 
leaf) trees. Then, two trees are picked at random and 
attached to a common root. This merging process is 
repeated indefinitely with rate set to 2 without loss of 
generality. Let c(t) be the number density of trees at 
time t. Initially, c(0) = 1, and since this quantity evolves 



according to dc/dt = — c 2 one has 
c(t) = 



1 + t 



(1) 



Mass conservation implies that TV, the average number of 
leaves in a tree, grows linearly with time, TV = cr x = 1+t. 
While the corresponding mass distribution has been ex- 
tensively studied in coalescence processes [ pdp2] |, we are 
interested here in the distribution of extremal character- 
istics such as the minimal and maximal number of bonds 
between the tree root and its nodes. The leading behav- 
ior of the average of these distributions can be obtained 
from the following intuitive argument. 

The distribution of tree heights, namely, of the dis- 
tances between the tree root and the nodes can be ob- 
tained immediately. Let P n (t) be the probability that the 
distance between a randomly chosen leaf and the root of 
the parent tree equals n at time t. As the tree generation 
process is random, P n (t) obeys Poisson statistics 



Pn(t) 



ih(tr 



-h{t) 



(2) 



with h(t) the average tree height. Consider a leaf in the 
system. Each time its corresponding tree merges with an- 
other tree, the distance to the root is augmented by one. 
This process occurs with rate 2 and hence, dh/dt = 2c. 
Integrating this equation subject to the initial condition 
h(0) = yields h(t) = 21n(l + t) = 2 In TV. One an- 
ticipates that the expected minimal number grows loga- 
rithmically as well, /i m in — w m i n In /V. To estimate h m i n 
we sum the small-n tail of the normalized height distri- 
bution cr x P n to unity, X)n=o c ~ 1 Pn = 1- Substituting 



Eqs. the relation h n 



In TV, and the Stir- 



ling formula In n\ ~ n Inn — n into the above relation we 
obtain the transcendental equation 



v In 



2e 



1. 



(3) 
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This equation has two solutions with the lower (higher) 
velocity corresponding to the growth of the average min- 
imal (maximal) tree height. Indeed, repeating the above 
steps for the maximal height using 2n=/t ( 
again leads to the same equation. Solving Eq. 



- X Pn = 1 

d§ yields 



frmin ^ Vmin In N, 
Vax ^ "max In N, 



^min — 

0.373365: 
W= 4.31107. 



(4) 



This probabilistic argument correctly predicts both ve- 
locities, and additionally, it demonstrates that the two 
extremal statistics are intimately related. We note that 
the latter maximal height value has emerged from quite 
different calculations in studies of collision processes in 
gases and fragmentation processes ||[| . 

We now turn to studying the entire distribution of ex- 
tremal characteristics. Rather than considering the two 
extremal height distributions separately, we study a more 
general model which interpolates between the two cases. 
In this model, each tree carries an extremal height k. The 
result of a merger between trees with extremal heights k\ 
and &2, is a new tree with extremal height k given by 



k = 



min(fci, fe) 
max(fci, k2) 



1 



with prob. 
with prob. 



1 - 



V 



(5) 



Here, p is a mixing parameter whose limits p = 1 and 
p = correspond to the minimal and the maximal heights 
problems, respectively. 

The number density of trees with extremal height k, 
Cfc(i), evolves according to the master equation 



dc k 



— = ($._ x - 2cc k + 2pc fc _ cj +2(1 -p)c k 



j=k 



fc-2 
3=0 



(0) 



Here c = X^>o c i * s t ne total tree density and one can 
verify that it indeed evolves according to dc/dt = —c 2 . 
The master equation (||) should be solved subject to the 
initial condition c k (0) = S k .o- It proves useful to intro- 
duce the cumulative fractions 



A k = c ^Cj, 

j=k 

and a new time variable 

T= f drc(r) = ln(l - 
Jo 

These variables recast Eqs. (||) into 
dA k 



dT 



= -A k + 2(1 - p)A h - X + (2p - 1)A 



(7) 



(8) 



(9) 



which should be solved subject to the step function initial 
conditions, A k (0) = 1 for k < and A k (Q) — otherwise. 

In the long time limit, A k (T) approaches a traveling 
wave form, A k (T) — > A(k — vT), with A(x) being a solu- 
tion of the nonlinear difference-differential equation 



vA'(x) = A(x)-2{l-p)A(x-l)-{2p-l)A 2 (x-l), (10) 

subject to the boundary conditions A{— 00) = 1 and 
A(po) = 0. Fortunately, the velocity v can be deter- 
mined without solving the nonlinear nonlocal equation 
( |l0| ) exactly. To determine v, it is enough to analyze the 
asymptotic behavior of the front at one of its two tails. 
Different considerations apply in the regions p < 1/2 and 
p > 1/2. We first consider the case 1/2 < p < 1. Here, 
Eq. ( |l0| ) admits an exponential solution in the small-fc 
tail, A(x) — > 1 — e Xx as x — > —00. Substituting this form 
in Eq. (|l0|), we find that the yet to be determined velocity 
v and decay exponent A are related via 



1 - 2pe 



—A 



(11) 



While a class of velocities is in principle possible, the ex- 
tremum value is selected for compact initial conditions. 
This behavior is similar to velocity selection occurring for 
example in the classic Fisher reaction-diffusion equation 
|l3| , ^| . Evaluating this extremum yields a generalization 
of the transcendental equation (0) 



v In 



2ep 



p > 



(12) 



In particular, v — ► 1 when p — > 1/2. In the minimal 
height case (p — 1) we recover the aforementioned value 
Vmin = 0.373365. The selected decay coefficient satisfies 
2p(l + A) = e A and in the minimal case A = 1.67835. 

Let us now turn to the complementary p < 1/2 case 
where contrary to the p > 1/2 case, the large-extremal- 
height tail admits an exponentially decaying solution 
A(x) = e~^ x : as x — > +00. Here, the yet to be deter- 
mined velocity and decay coefficient are related via 



2(1 -p)e» - 1 



(13) 



Again, applying the velocity selection principle implies 
that the minimal possible velocity is selected. The se- 
lected decay coefficient satisfies 2(1 — p)(l — fi) = 
and the selected velocity obeys 



v In 



2e(l-p) 



P < 



1 



(14) 



In the maximal case (p = 0) one recovers the velocity 
Vmax = 4.31107 and additionally, the decay coefficient is 
/i = 0.768039. While we have not proved this selection 
principle, our numerical integration of Eq. (^|) supports 
the findings in the minimal and the maximal cases. We 
confirmed that a traveling wave solution is indeed ap- 
proached, and that the selected velocities fall within 0.1% 
of the theoretical values. Curiously, this is a unique case 



2 



where velocity selection is also supported by an indepen- 
dent physical argument. 

Interestingly, different mechanisms drive the front in 
the regions p < 1/2 and p > 1/2. In the region p > 1/2 
which includes the minimal case, the small-extremal- 
height tail of the distribution dictates the velocity and in 
fact, the entire distribution is enslaved to this exponen- 
tial tail. The opposite is true for p < 1/2 which includes 
the maximal case. Here, the large-extremal-height tail of 
the distribution governs the wave velocity and the wave 
form. These behaviors are physical, especially when the 
limiting cases are considered: the distribution of minimal 
(maximal) tree heights is governed by extremely small 
(large) fluctuations. The point p c = 1/2 can be regarded 
as a critical point. In particular, the different velocity 
equations (|l2| ) and ( |l4|) imply non-analytic behavior at 
p c = 1/2 where v c = 1, and analysis of the leading be- 
havior in the vicinity of this point yields 



2\p~ Pc \ 1/2 , when 



P~> Pc 



(15) 



Exact analysis of the special case p = 1/2 is given below. 

Asymptotically, while the wave front advances at a 
constant rate v, there is a slow T^ 1 correction in the 
leading order, resulting in a logarithmic correction to the 
front position. A similar correction was first derived by 
Bramson in the context of reaction-diffusion equations 
|jl4j| , and was subsequently generalized []T5|-ji"7f. We now 
calculate the leading correction employing the approach 
of Ref. fl6}| . Let us first consider the p > 1/2 case. Sub- 
stituting Ak(T) = 1 — afc(T) into Eq. @, and ignoring 
the quadratic term a 2 _ 1 , yields 



dak 
~dT 



-a k 



2pa k -i- 

Substituting the scaling solution 



(16) 



a k (T) = T a G(xT~ a )e Xx , x = k - vT -w(T). (17) 

into ([r]) shows that different leading orders are compati- 
ble provided that the exponent a = 1/2 and that the cor- 
rection to the front location is w(T) = /3 In T. The former 
constraint reflects a hidden diffusive scale and the latter 
gives the aforementioned logarithmic correction with yet 
undetermined amplitude (3. Substituting these behaviors 
in Eq. ([l6]) we get 

= T 1/2 (2pe- x -1 + Xv) G{z) + (v- 2pe- x ) G'(z) 



T 



-1/2 



e- x G"(z) + -zG'(z) + 



2A/3- 1 



G(z) 



The terms of the leading ©(T 1 / 2 ) order cancel when 
2pe~ x + Xv — 1, i.e., when the velocity v and the de- 
cay exponent A are related through Eq. (|Tl|). The terms 
of order 0(1) cancel when 2pe~ x = v. Remarkably, this 
relation together with Eq. (|TT|) hold only in the extremal 



point where v and A are given by Eqs. ([l2|) and (|Tl]), 
respectively. Finally, the terms in the lowest 0(T~^^) 
order cancel when the scaling function G(z) satisfies the 
parabolic cylinder equation 



d 2 G 



dG 

dz 



+ (20\ - l)G(z) = 0. 



(18) 



This differential equation should be solved subject to 
the appropriate boundary conditions: (i) G(z) — > for 
z — > — oo as B k (T) must vanish when T — > oo, and (ii) 
G(z) ~ z for z — > to ensure that B k {T) is independent 
of k for large k. The former boundary condition selects 
one of the two possible solutions, G(z) — C e~ v / 4 D v (y), 
where y = z/y/v, v = 2((3\ — 1), and D v {y) is the 
parabolic cylinder function with index v. The second 
boundary condition fixes the index, v = 1, implying 
(3 = 3/(2A). Hence, we find that the expected extremal 
tree height, (k) = cr 1 ^ fe kc k , grows with N as 



(fc) = win N + — In In AT, 
2 A 



(19) 



where the relation T = in A was used. Obviously, the 
latter log-log correction can not be obtained from the 
heuristic argument presented earlier. The same analysis 
can be carried out for the p < 1/2 case where we find, 



(k) =vlnN In In A", 

2fi 



(20) 



with v and p given by Eqs. (|l4|), and (p^|), respectively. 
For completeness, we merely quote results of a more so- 
phisticated approach |l7j which allows calculation of the 
second leading correction to the growth rate 



d(k) 
~dT 



_3_ J.-1 , 

3_ rp—l 

2fi 1 



M 2 



T -3/2 ) p>l/2, 

lT-*l\ p<l/2. 



(21) 



We now return to the critical case p = 1/2. The crit- 
ical behavior is simpler because Eq. (^) becomes linear, 
and it can be easily solved by the generating function 
method. From Eq. (||) we find that the generating func- 
tion Q(z,T) — J2k>i Akz k satisfies the differential equa- 
tion dQ/dT = z + (z — l)Q(z,T), subject to the initial 
condition Q(z, 0) = 0. This equation admits the solu- 
tion Q(z, T) = z[l — e~( 1-z ) T ]/(l — z), and expanding in 
powers of z gives the cumulative distribution 



(22) 



m—k 



Using Eq. (j^) and c(T) — e _T , we find that the ex- 
tremal height distribution is proportional to a Poisson 
distribution, c k (T) = e~ 2T T k /k\. Asymptotically, the 
normalized height distribution approaches a Gaussian, 
c" 1 c fe (T) ->■ exp[-(jfe - T) 2 /2T]/V2^T, and the cumula- 
tive fractions easily follow 



3 



A k {T) 



-erf 



k-T 



'2T 



(23) 



Hence, both tails are Gaussian, consistent with the fact 
that A = /i = when p = 1/2. Interestingly, the hidden 
diffusive scale becomes pronounced, and the wave front 
broadens indefinitely with a width of the order VT. 

Thus far, we have considered closed systems where only 
trivial trees are initially present. However, in many ap- 
plications such as data storage algorithms, as well as in 
physical situations such as river networks jt8| and frag- 
mentation processes there may be a constant in- 
put into the system. We therefore consider the natural 
case where an initially empty system is subject to a uni- 
form input of trivial trees. In this case, we must add 
an additional input term 5^,0 into the right-hand side of 
Eq. (^). The initial condition now reads Cfc(O) = 0. The 
overall density evolves according to dc/dt = 1 — c 2 , i.e., 
c(i) = tanh(i). Hence, the system eventually reaches a 
steady state with density c = 1. 

We restrict our attention to the steady state distribu- 
tions which can be obtained by equating the time deriva- 
tives in the master equations (^) to zero. Again, we in- 
troduce the cumulative densities B k = Y^jLk c j which a t 
the steady state satisfy Bq = c = 1 and 



B k = (l-p)B fc _i 



(p-l/2)Bl 



(24) 



for k > 1. Three different behaviors arise depending 
on whether p = 1, < p < 1, or p = 0. For p = 1, 
solving (|24|) recursively gives B k = 2-^"-^. There- 
fore, Cfe = 2~( 2 ''~ 1 ) ^1 — 2~ 2t ^, implying that the min- 
imal height distribution decays as an unusual double- 
exponential. For < p < 1, one can neglect the non- 
linear term in Eq. (24) in the large k limit. Thence, 
B k ~ (1 — p) k implying a generic exponential decay of 
the distribution Ck at large k. The critical behavior dis- 
appears and the only notable feature of the p = 1/2 case 
is that it is exactly solvable, B k = 2~ k . For the max- 
imal height problem (p = 0), the recursion ( p4j ) sim- 
plifies (in the large k limit) to a differential equation 
dB/dk = —B 2 /2 which is solved to give B k ~ 2fc _1 . 
Thus, the maximal height distribution exhibits a power- 
law decay, c k — 2k~ 2 for large k. To summarize, we quote 
(up to a numeric prefactor) the three leading large-fc be- 
haviors 



(25) 



Thus, input dramatically alters the height distributions 
with a wide array of possible outcomes including double 
exponential, exponential, and algebraic decays. 

In summary, we have shown that extremal properties 
of random trees can be obtained by analyzing the corre- 
sponding nonlinear evolution equations. The cumulative 




distributions of extremal tree heights approach a travel- 
ing wave solution. The mean extremal values grow loga- 
rithmically with the tree size and there is an additional 
weak double logarithmic correction. The corresponding 
growth velocities were obtained from an elementary prob- 
abilistic argument and from an extremum selection cri- 
teria on the traveling wave solution. Interestingly, while 
the traveling wave velocity and form in the minimal case 
is determined by extremely small height fluctuations, the 
opposite holds for the maximal case. The transition be- 
tween these two behaviors is marked by a sharp phase 
transition in a model which interpolates between the two 
extremal characteristics. Additionally, we have showed 
that the presence of input may lead to double exponen- 
tial, exponential, or even algebraic decays of the extremal 
height distribution. It will be interesting to apply rate 
equation theory to a closely related random tree charac- 
teristics, e.g., the bifurcation ratio and the rank |^0| , |l8| . 
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